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Following the discovery of the first significant samples of galaxies at z > 6.5 with Wide 
Field Camera 3/Infrared (WFC3/IR) on board Hubble Space Telescope {HST), it has been 
claimed that the faintest high-redshift galaxies display extremely blue ultraviolet (UV) con- 
tinuum slopes, with a UV power-law index (3 ~ —3 (where fx oc A' 3 ). Such slopes are bluer 
than previously reported for any other galaxy population, and are most readily explained theo- 
retically by extinction-free, young, very low-metallicity stellar populations with a high ioniz- 
ing photon escape fraction. Here we undertake a critical study of the evidence for such extreme 
values of /?, combining three new WFC3/IR-selected samples of galaxies spanning nearly two 
C<~) [ decades in UV luminosity over the redshift range z ~ 4.5 — 8. We explore the impact of inclu- 

J> ■ sion/exclusion of less robust high-redshift candidates, and use the varying depths of the sam- 

pies to explore the effects of noise and selection bias at a given ultraviolet luminosity. Simple 
CD data-consistency arguments suggest that artificially blue average values of (3 can result when 

the analysis is extended into the deepest ~ 0.5 mag bin of these WFC3/IR-selected galaxy 
samples, regardless of the actual luminosity or redshift range probed. By confining attention 
. to robust high-redshift galaxy candidates, with at least one 8-er detection in the WFC3/IR 

^D ■ imaging, we find that the average value of j3 is consistent with (/3) = —2.05 ± 0.10 over the 

redshift range z = 5 — 7, and the UV absolute magnitude range —22 < Mjjv.ab < — 18, 
and that (/3) shows no significant trend with either redshift or Mjjv We create and analyse a 
set of simple end-to-end simulations based on the WFC3/IR+ACS Hubble Ultra Deep Field 
(HUDF) and Early Release Science datasets which demonstrate that a bias towards artifically 
low/blue average values of /3 is indeed "expected" when the UV slope analysis is extended 
towards the source detection threshold, and conclude that there is as yet no clear evidence 
for UV slopes significantly bluer than f3 ~ —2, the typical value displayed by the bluest 
star-forming galaxies at more modest redshifts. A robust measurement of (/3) for the faintest 
galaxies at z ~ 7 (and indeed z ~ 8) remains a key observational goal, as it provides a funda- 
mental test for high escape fractions from a potentially abundant source of reionizing photons. 
This goal is achievable with HST, but requires still deeper WFC3/IR imaging in the HUDF. 

Key words: galaxies: high-redshift - galaxies: evolution - galaxies: formation - galaxies: 
starburst - cosmology: reionization 
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1 INTRODUCTION 

The first galaxies, by definition, are expected to contain very young 
stellar populations of very low metallicity. However, the possibility 
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of detecting unambiguous observable signatures of such primordial 
stellar populations with current or indeed planned future instrumen- 
tation is currently a matter of considerable debate. 

For example, one long-sought distinctive spectral signature of 
the first generation of galaxies is relatively strong Hell emission 
at Xrest = 1640A (e.g. Shapley et al. 2003, Nagao et al. 2008, di 
Serego Alighieri et al. 2008). However, near-infrared spectroscopy 
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of the sensitivity required to detect this line at z > 7 will certainly 
not be available until the James Webb Space Telescope (JWST), and 
even then some theoretical predictions indicate that it is unlikely to 
be found in detectable objects (Salvaterra, Ferrara & Dayal 2011, 
but see also Pawlik, Milosavljevic & Bromm 201 1). 

By necessity, therefore, recent attention has focussed on 
whether the broad-band near-infrared photometry which has now 
been successfully used to discover galaxies atz ~ 6.5 — 8.5 
(e.g. McLure et al. 2010; Oesch et al. 2010; Bouwens et al. 2010a; 
Bunker et al. 2010; Finkelstein et al. 2010; Vanzella et al. 201 1) can 
actually be used to establish the rest-frame continuum slopes of the 
highest redshift galaxies. Specifically, very young, metal-poor stel- 
lar populations are arguably expected to result in substantially bluer 
continuum slopes around A rest ~ 1500A than have been detected 
to date in galaxies discovered at any lower redshift z < 6.5 (e.g. 
Steidel et al. 1999; Meurer et al. 1999; Adelberger & Steidel 2000; 
Ouchi et al. 2004; Stanway et al. 2005; Bouwens et al. 2006; Hathi 
et al. 2008; Bouwens et al. 2009; Erb et al. 2010). 

It has become the normal convention to parameterise the ultra- 
violet continuum slopes of galaxies in terms of a power-law index, 
/3, where fx oc A" (e.g. Meurer et al. 1999; thus, /3 = — 2 corre- 
sponds to a source which has a flat spectrum in terms of f v , and 
hence has zero colour in the AB magnitude system). As discussed 
by several authors, while the bluest galaxies observed at z ~ 3 — 4 
have P ~ —2, values as low (i.e. blue) as /3 = —3 can in prin- 
ciple be produced by a young, low-metallicity stellar population 
(e.g. Bouwens et al. 2010b; Schaerer 2002). However, for this ide- 
alized prediction to actually be realized in practice, several condi- 
tions have to be satisfied simultaneously, namely i) the stellar pop- 
ulation has to be very young (e.g. t < 30Myr for Z ~ 10~ 3 Zq, 
or t < 3Myr for Z ~ 10~ Zq), ii) the starlight must obviously 
be completely free from any significant dust extinction, and iii) the 
starlight must also not be significantly contaminated by (redder) 
nebular continuum (a condition which has important implications 
for UV photon escape fraction, and hence reionization - see, for 
example, Robertson et al. 2010). 

For this reason, the recent report by Bouwens et al. (2010b) 
(supported to some extent by Finkelstein et al. 2010) that the 
faintest galaxies detected at z > 6.5 do indeed display an average 
value of (f3) = —3.0 ± 0.2 is both exciting and arguably surpris- 
ing enough to merit further detailed and independent investigation. 
This is especially the case because some authors are already be- 
ginning to assume that the existence of such extreme blue slopes 
is a robust result, already ripe for detailed theoretical interpretation 
(e.g. Taniguchi et al. 2010). 

The aim of this paper is to carefully assess whether the cur- 
rent HST WFC3 data do indeed provide clear evidence for such 
extremely blue slopes in faint galaxies at z ~ 7. There are a num- 
ber of potentially subtle biases which can affect the determination 
of UV continuum slopes from the WFC3/IR data, especially when, 
as is inevitably the case for the faintest objects, the results have to 
be based on the average colours of galaxies whose individual j3 val- 
ues have associated errors which can be as large as A/3 ~ ±1.5. 
To check for, and attempt to quantify, the extent of any such bi- 
ases we undertake two different approaches in this paper. First, we 
take advantage of the dynamic range offered by the available public 
WFC3/IR imaging to explore how derived values of /3 (and average 
values (/3)) depend on galaxy candidate robustness and signaknoise 
ratio as we approach the flux limit of a given survey. Second, we 
undertake and analyse a set of fairly simple (but complete end-to- 
end) simulations to explore what apparent values of (and trends in) 
would be deduced from the existing WFC3/IR data for different 



assumed input values of /3 — — 2,— 2.5,— 3 combined with realis- 
tic estimates of the faint-end slope of the z ~ 7 galaxy luminosity 
function. 

The layout of this paper is as follows. First, in Section 2 we 
briefly review how we have selected three new, high-redshift galaxy 
samples from the WFC3/IR+ACS+IRAC imaging of the Hubble 
Ultra Deep Field (HUDF), the HUDF Parallel Field 2 (HUDF09- 
2), and the Early Release Science imaging (ERS) of the north- 
ern portion of GOODS-South. The reduction of the HST data, the 
deconfusion of the Spitzer IRAC data, and the extraction, analy- 
sis, classification and redshift estimation of the galaxies uncovered 
from this imaging are described in detail in McLure et al. (2011), 
as this underpins the extraction of a new robust galaxy sample at 
6 < z < 8.7 which is the focus of the McLure et al. (2011) study. 
In this study we retain not only the robust z > 6 sources detailed in 
McLure et al. (201 1), but all galaxies from the larger parent sample 
with acceptable redshift solutions at z > 4.5, which are classified 
as either ROBUST or UNCLEAR. This allows us to explore trends 
in /3 over a reasonably wide range in redshift (5 < z < 8) and UV 
luminosity (—22 < Muv,ab < 18), and also to explore poten- 
tial biases introduced by the exclusion or inclusion of galaxies with 
less robust photometric redshifts. In Section 3 we explain how we 
determined the rest-frame UV continuum slope, /3, for the galaxies 
extracted from the different imaging datasets at different redshifts. 
Then, in Section 4 we present and analyse our results, and demon- 
strate what level of data quality is actually required to achieve in- 
ternally consistent results between the different galaxy samples un- 
covered from surveys of varying depths. We move on to describe 
and analyse our simulations in Section 5 before discussing the im- 
plications of our findings in Section 6. A summary of our conclu- 
sions is then presented in Section 7. All magnitudes are quoted in 
the AB system (Oke & Gunn 1983) and any cosmological calcula- 
tions assume Q.m = 0.3, Qa = 0.7, and Ho = 70 kms _1 Mpc _1 . 



2 GALAXY SAMPLES 
2.1 Basic sample production 

The candidate galaxies were all selected from our own reductions 
of the pubicly available near-infrared WFC3/IR imaging of the 
HUDF, ERS and HUDF09-2 fields, as described in McLure et al. 
(201 1) (we note that the HUDF WFC3/IR imaging is the same year- 
1, 2009, imaging as utilised in McLure et al. 2010). In brief, candi- 
date selection in all three fields was undertaken by first selecting 
sources with SEXTRACTOR (Bertin & Arnouts 1996) down to a 
deep signaknoise limit in each of the WFC3/IR Yios/io98, J125 
and Hieo images, and then forming the superset of near-infrared 
selected sources by merging these catalogues. 

Then, as again detailed in McLure et al. (2011), photometric 
redshifts (with associated probability distributions) were derived 
for all potential sources based on 0.6-arcsec diameter flux-density 
measurements made on the available HST ACS optical imaging, 
the WFC3/IR imaging, and the Spitzer IRAC imaging (after decon- 
fusion of the IRAC images based on the WFC3/IR Hieo or J125 
data). 

The samples were then culled to retain only sources with an 
acceptable solution at redshift 2 > 4.5 (i.e. redshift solutions with 
a formally acceptable value of \ 2 » typically x 2 < 10 given the 
number of data points and model free parameters). All candidate 
objects were then visually inspected, and rejected from the cata- 
logue if they lay too near to the perimeter of the imaging, or too 
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Figure 1. Plots of ft versus redshift, z, for all sources in the HUDF sample (left), the HUDF09-2 sample (centre), and the ERS sample (right). Filled symbols 
indicate ROBUST sources, open symbols indicate UNCLEAR sources which have acceptable alternative low-redshift solutions. 
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Figure 2. Plots of /3 versus UV absolute magnitude, Afi5oo, for all sources in the HUDF sample (left), the HUDF09-2 sample (centre), and the ERS sample 
(right). Filled symbols indicate ROBUST sources, open symbols indicate UNCLEAR sources which have acceptable alternative low-redshift solutions. 



close to bright sources (a cull that is reflected in the effective sur- 
vey areas quoted by McLure et al. 201 1). Any remaining suspected 
pseudo sources arising from image artefacts were also removed at 
this stage. 



Finally, all of the ACS + WFC 3/IR+IR AC spectral energy dis- 
tribution (SED) fits were inspected, and the sources classified as 
either ROBUST or UNCLEAR depending on whether the alterna- 
tive low-redshift solution could be excluded at > 2-a on the basis 
of Ax 2 > 4. We note here that the ratio of ROBUST: UNCLEAR 
sources varies substantially between the fields, being ~2:1 in the 
HUDF, ~ 1:1 in the ERS, and ~ 1:2 in HUDF09-2. This is primar- 
ily due to the variation in the depths of the available optical ACS 
imaging, relative to the new WFC3/IR near-infrared imaging, as 
discussed further below. 



Absolute rest-frame UV magnitudes, M1500, have been calcu- 
lated for all objects by integrating the spectral energy distribution 
of the best-fitting evolutionary synthesis model (see McLure et al. 
201 1) through a synthetic "narrow-band" filter of rest-frame width 
100 A and correcting to total magnitude (from the 0.6-arcsec aper- 
ture magnitudes on which the SED fitting was based) via subtrac- 
tion of a global aperture correction of 0.25 mag. 



2.2 HUDF 



In the HUDF the high-redshift galaxy sample reported by McLure 
et al. (2010) has now been superceded by the galaxy sample ex- 
tracted by McLure et al. (2011). The new parent sample utilised 
here includes Spitzer IRAC detections/limits in the selection pro- 
cess, and extends to lower redshift to include all objects with an 
acceptable primary redshift solution at z > 4.5. 

The resulting HUDF sample contains a total of 147 candidate 
galaxies with z v hot > 4.5. Within this master sample, 95 sources 
are considered ROBUST according to the criterion that the alter- 
native lower-redshift solution can be rejected with better than 2-a 
confidence (i.e. Ayj 2 > 4). The relatively high fraction of robust 
high-redshift sources in this field reflects in large part the extreme 
depth of the asociated optical ACS imaging in the HUDF, which 
helps to establish the robustness of any potential Lyman breaks. 

As shown in Figs 1 and 2, the final HUDF galaxy sample 
at z > 4.5 extends to z > 8, and samples a rest-frame UV lu- 
minosity range corresponding to —21 < M1500 < —17 (AB). 
However, with one exception, ROBUST sources are confined to 

-21 < M1500 < -18 (AB). 
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2.3 HUDF09-2 

The WFC3/IR imaging of the HUDF09-2 (HUDF Parallel 2) field 
utilised here is only 0.07 mag shallower than the 2009 HUDF 
WFC3/IR imaging in the J125 band, and covers a similar area. The 
extracted parent sample is thus comparable in size, but due to the 
shallower depth of available optical ACS imaging, the fraction of 
ROBUST: UNCLEAR sources is much lower (see McLure et al. 
2011). 

The HUDF09-2 sample used here contains 135 candidate 
galaxies with z v hot > 4.5. Within this master sample, 49 sources 
are considered ROBUST according to the criterion that the alter- 
native lower-redshift solution can be rejected with better than 2-cr 
confidence (i.e. A\ 2 > 4). 

As shown in Figs 1 and 2, the final HUDF09-2 galaxy sample 
at 2 > 4.5 again extends to 2 > 8, and samples a rest-frame UV 
luminosity range corresponding to —21 < M1500 < —17 (AB). 
However, in this case, ROBUST sources are basically confined to 
-21< M1500 < -18.5 (AB). 

2.4 ERS 

The ERS WFC3/IR imaging of the northern portion of GOODS- 
South covers an area ~ 10 times larger than each of the above- 
mentioned ultra-deep fields, but is typically a magnitude shallower. 
In addition, the Y098 filter was utilised in the ERS observations, 
rather than Y105, making the Y-band imaging even shallower. Be- 
cause our galaxy selection does not involve specific colour cuts, 
this does not complicate our redshift completeness (c.f. Oesch et 
al. 2010, Bouwens et al. 2010a) but this, in combination with opti- 
cal data limited to GOODS depth, does mean that about half of the 
ERS sample is classified as UNCLEAR. 

The ERS sample used here contains 337 candidate galaxies 
with z p hot > 4.5, Within this master sample, 160 sources are 
considered ROBUST according to the criterion that the alternative 
lower-redshift solution can be rejected with better than 2-a confi- 
dence (i.e. Ax 2 > 4). 

As shown in Figs 1 and 2, the final ERS galaxy sample at 
z > 4.5 extends to z > 8, and samples a brighter rest-frame UV 
luminosity range corresponding to —22 < M1500 < — 19 (AB). 



3 MEASUREMENT OF UV CONTINUUM SLOPES 

As already mentioned, the standard convention is to characterise 
the rest-frame UV continuum slope via a power-law index, /3, 
where fx oc X 13 . 

Given the effective wavelengths of the WFC3/IR fil- 
ters of interest here (Fo98:A e //=9864A; Yio5:A !! //=10552A; 
Ji25:A e// =12486A; H 160 :\ eff =15369A) the relevant conver- 
sions from AB mag colours to /3 are: 

P = 4.43( J125 - Hieo) - 2 (1) 

P = 5.47(Yio 5 - J125) - 2 (2) 

P = 3.91 (F 98 - J125) - 2 (3) 

Choosing between the latter two options is dictated by which 
Y-band filter was used in the observations, but otherwise the choice 
is determined by the estimated redshift of the source. Specifically, 
at z est > 6.5, both the Lyman-break and any potential Lyman- 
a emission can enter the F-band (which cuts in at 9000A), thus 
contaminating any measure of p. Given the uncertainties in z est 




M 1500 

Figure 3. Plots of /3 versus redshift 2, and /3 versus UV absolute magni- 
tude Aii5oo, for all the ROBUST sources in the 3 samples (blue=HUDF, 
green=HUDF09-2, red=ERS). 



we therefore use equation (1) for any source with z est > 6.5, and 
equations (2) or (3) as appropriate at lower redshift (to ensure we 
sample comparable rest-frame wavelengths at all redshifts). 

We also note that, at z est > 8, both the Lyman-break and 
any potential Lyman-a emission can enter the Ji25-band (which 
cuts in at 1 1000A), and hence, at least for the current photometric 
dataset, any values of /3 derived for sources at z est > 8 should not 
be taken seriously (this is why neither Bouwens et al. (2010b) nor 
Finkelstein et al. (2010) attempted measurement of P at z ~ 8). 
However, in reality, after application of the galaxy sample quality 
control described below, only one galaxy candidate at z es t > 8 
survives (in the HUDF09-2 field) for inclusion in the calculation 
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Figure 4. Plot of /3 versus observed J125 for all sources in the 3 samples 
with Zest > 6 (blue=HUDF, green=HUDF09-2, red=ERS). Filled sym- 
bols indicate ROBUST sources while open symbols indicate UNCLEAR 
sources which have acceptable alternative low-redshift solutions. Appar- 
ently extremely blue sources with < —3 occur at different magnitude 
ranges for the different samples. 



The other point to note from Fig. 2 is that all three samples 
contain ROBUST sources with apparent values of P as extreme as 
P < —5. We have not plotted error bars on individual /3 values in 
Fig. 2, but as discussed and plotted in McLure et al. (2011), these 
individual values are of course highly uncertain, with A/3 ~ ±1.5 
or even larger. The fact that photometric uncertainty must play a 
major role in producing these extreme values of P is revealed in 
Fig. 2 by the fact that the plume of extremely blue/low P values 
emerges at a different absolute magnitude in the different samples. 

To illustrate this more clearly we plot all three samples of RO- 
BUST sources together in Fig. 3. It can be seen that very blue 
sources with apparent values of /3 < —4 emerge in the ERS 
sample at M1500 — —20, whereas in the deeper HUDF data the 
emergence of such apparently extreme sources is delayed until 
A^isoo > —18.7. The impact on the scatter in ft of approaching 
the flux-density limit is perhaps illustrated more clearly in Fig. 4 
where we confine attention to z > 6, and plot ft versus observed 

Of course, while photometric scatter inevitably causes the ap- 
parent range of /3 to rapidly increase as the sample flux limit is 
approached, there are also a large number of galaxies in the fi- 
nal luminosity bin probed by each sample, so moderately accurate 
measures of the average value of (P) can in principle still be de- 
rived. Therefore the key issue, which we now explore, is how the 
average value of (p) is affected by the inclusion or exclusion of 
ROBUST/UNCLEAR sources, and also the signaknoise ratio limit 
of the galaxy catalogue. 



of average values of /3. Finally, we note that equation (1) above 
differs very slightly from the relation adopted by Bouwens et al. 
(2010b), which is P = 4.29( J125 - H 160 ) - 2, but the differences 
in derived values of P are completely insignificant in the current 
context (e.g. for J125 — -ffieo = —0.2, the Bouwens et al. relation 
yields P = -2.86, while equation (1) yields /3 = -2.89). 

4 RESULTS 
4.1 Raw results 

In Figs 1 and 2 we plot the raw values of /3 for each source in the 
HUDF, HUDF09-2 and ERS samples versus redshift, z, and UV 
absolute magnitude, Misno- In each plot the sources classified as 
ROBUST are indicated by the filled symbols, and those classified as 
UNCLEAR are indicated by the open symbols. As well as illustrat- 
ing the range of redshift and UV luminosity probed by each galaxy 
sample, these plots dramatically illustrate what an extreme range 
of apparent individual values of f3 results from the photometric un- 
certainty in colour, especially in the fainter luminosity bins probed 
by each sample. In general it can be seen that a large fraction of 
the faintest galaxies are classified as UNCLEAR. This simply re- 
flects the fact that galaxies detected with relatively low signaknoise 
ratio in WFC3/IR, even if completely undetected at shorter wave- 
lengths, do not display sufficiently strong breaks in their SEDs to 
rule out an alternative low-redshift solution (i.e. a Balmer break 
rather than a Lyman break). However, it is also apparent that, even 
at brighter magnitudes, a large fraction of the reddest galaxies, with 
P > — 1, are classified as UNCLEAR. This simply reflects a lack of 
compelling evidence that the continuum above the putative Lyman 
break is blue enough to rule out a lower-redshift (possibly dusty) 
solution. We return to this point later in the paper, when discussing 
the results of the simulations in Sections 5 and 6. 



4.2 Quality control 

In Fig. 5 we have divided the three-field sample into 3 redshift bins, 
and binned the data into luminosity bins 1 magnitude wide. We 
therefore plot the average value of P, ({P}) and in each plot show 
the sample-to-sample variation and the final overall average and 
standard error. 

The top row of plots shows the results as computed from all 
the individual data points shown in Figs. 1 and 2, including both 
ROBUST and UNCLEAR sources. The second row shows the ef- 
fect of including only the ROBUST sources. Finally, the third row 
shows the effect of further limiting the ROBUST samples to contain 
only objects which have at least one 8-<r near-infrared detection in 
the WFC3/IR data. 

This set of plots reveals some interesting trends in the data 
which can be summarized as follows. First, it can be seen that re- 
stricting the sample to ROBUST sources only has the general effect 
of moving (P) to lower (i.e. bluer) values at virtually all redshifts 
and luminosities. Some of this effect may be due to the (desirable) 
removal of lower-redshift interlopers from the high-redshift galaxy 
samples. However, some of this shift could also be due to a bias 
introduced by the fact that a blue UV continuum slope increases 
the chance that a source will be classified as ROBUST, especially 
in the absence of a high signaknoise ratio break. This move to 
the blue is most dramatic in the faintest magnitude bin; taken at 
face value the middle row of plots in Fig. 5 suggests that galax- 
ies with Muv.ab — —18.5 have a typical UV continuum slope 
which evolves rapidly with redshift, with (P) ~ —2.2 at z ~ 5, 
(P) ~ -2.5 at z ~ 6, and (J)) ~ -3 at z ~ 7. This is clearly 
fairly similar to the trend reported by Bouwens et al. (2010b). 

However, it is also very clear that, especially in the faintest 
luminosity bins, there are huge sample-to-sample variations with, 
for example, the ERS sample delivering very blue values of (P) 
in the Muv,AB — —19.5 bin at z ~ 7 compared to either of 
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Figure 5. Plots of average (/?) versus UV absolute magnitude, at redshift z ~ 5, 6 & 7 for all sources (top row), for ROBUST sources (middle row), and with 
the additional requirement of at least one 8-<r near-infrared detection (bottom row). In all panels the small coloured dots show the averages derived from each 
individual field (blue=HUDF, green=HUDF09-2, red=ERS) while the overall average and standard error are indicated by the black hexagons and error bars. 



the deeper HUDF and HUDF09-2 samples. This is basically the 
effect of the plume to low values of /3 seen at the ERS flux limit as 
shown in Fig. 4. To reconcile the results from the different samples 
in the luminosity bins in which they overlap, we found it necessary 
to insist on a minimum signahnoise ratio requirement. To avoid 
introducing any further colour bias we simply chose to insist that 
every object retained in the final, refined sample was detected in at 
least one WFC3/IR near-infrared passband at a minimum level of 8- 
a. The impact of this further level of quality control is then shown 
in the bottom row of plots in Fig. 5. No longer do the different 
samples deliver substantially different average values of (/?), and 
it can be seen that the very low values of (/?) were indeed largely 
resulting from the lowest signahnoise ratio sources. Importantly, 
with this level of further quality control, we are left with only one 
HUDF object in the Muv,ab — —18.5 bin at z ~ 7, and hence 
cannot plot a meaningful average value of (/3). At z = 6, where 



we can still probe this luminosity bin, the evidence for (/?) being 
significantly bluer than (/?) = — 2 has disappeared. 

This final result is summarized in Fig. 6, where we ovetplot 
the dependence of (/?) on Muv,ab as a function of redshift. The 
derived datapoints shown in Fig. 6 are tabulated in Table 1 . Clearly, 
these results are consistent with (/?) = —2 over the full redshift 
and luminosity range which can be probed with these data. At the 
bright end they are also in good agreement with the results derived 
by both Bouwens et al. (2010b) and Finkelstein et al. (2010), so any 
disagreement is really confined to Muv > —20. 

Of course, it might be argued that by insisting on rejecting 
the lowest signahnoise ratio sources, we have effectively "thrown 
away" the "evidence" for how (3 behaves at the faintest luminosities 
at the highest redshifts. However, for all the data consistency argu- 
ments outlined above, we have good reason to suspect that when 
(/3) in a given luminosity bin depends purely on low signahnoise 
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Table 1. Derived average f3 values and standard errors as a function of ab- 
solute UV magnitude and redshift, as calculated using the robust sample of 
sources with the additional requirement of at least one 8-<r near-infrared de- 
tection. The final column simply gives the average value of 0, further aver- 
aged over the redshift range 2 = 5 — 7, including values from each redshift 
bin where this is available. The data given here are plotted in Fig. 6. 
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ratio detections, its average value may be seriously biased. To ex- 
plore whether this suspicion is fully justified, and to quantify the 
likely magnitude of any such effect, we now describe the creation 
and analysis of a set of simple simulations. 



5 SIMULATIONS 
5.1 Simulation design 

To explore and attempt to explain the origin of any apparent bias 
towards excessively blue values of fi as derived from sources ex- 
tracted with only ~ 4 — 5-ct photometry, we undertook a set of rel- 
atively simple simulations. Specifically, we decided to insert into 
the ERS and HUDF HST (ACS+WFC3) and Spitzer IRAC images 
a population of galaxies at z ~ 7, with a chosen fixed intrinsic 
value of fi, and then reclaimed these sources using exactly the same 
methodology as used to extract and refine the real high-redshift 
galaxy sample (i.e., including initial use of SExtractor, full ACS- 
WFC3-IRAC photometry from the real images, derivation of x 2 
versus z p hot, high-redshift sample refinement requiring a statisti- 
cally acceptable solution at z p hot > 6, and final branding as RO- 
BUST or UNCLEAR depending on whether Ax 2 > 4 between the 
low-redshift and high-redshift solutions). 

We created three simulations, one in which all galaxies were 
assigned template SEDs with fi = — 2, one with all galaxies having 
fi — —2.5, and an extreme simulation with fi = —3.0. Perhaps the 
key feature of our simulations is that we inserted a galaxy popula- 
tion which extended substantially below the nominal flux limits of 
the images, following the form of the McLure et al. (2010) z ~ 7 
luminosity function down to J125 = 30(AB). This is vital to prop- 
erly simulate the effect of "flux-boosting" of some subset of the 
numerous faint sources into the final galaxy sample. The point here 
is that, while completely erroneous 5-ct sources are extremely un- 
likely, the random flux boosting of, for example, 3-ct sources to 
~ 5-ct in either J125 or -H160 can be relatively common when ex- 
tracting a flux-limited sample down to the ~ 4 — 5-ct limit in the 
presence of steep number counts. Moreover, such flux boosting is 
highly likely to be accompanied by a significant distortion in de- 
rived colour, in excess of that "expected" from adding the formal 
photometry errors in quadrature (because it is extremely unlikely 
that a source would fall on, for example, a 2-ct positive noise peak 
in both the J125 and Hieo images). 
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Figure 6. The upper panel shows final average (f3) versus UV absolute 
magnitude, M1500, at redshift z ~ 5 (blue), 2 ~ 6 (green), and 2 ~ 7 
(red). Values, along with standard errors in the mean, are plotted for any 
redshift/luminosity bin where the quality control described in the text left 
more than one source. The lower panel simply shows the average of the 
data-points shown in the upper panel, and thus provides an average value of 
for each luminosity bin, over the redshift range 2 = 5 — 7, including val- 
ues from each redshift bin where this is available. These data are tabulated 
in Table 1 . There is no clear evidence for any trend with either luminosity 
or redshift, and all values are consistent with f3 = —2. 



5.2 Simulation results 

We created many realizations for both the ERS and HUDF fields. 
A typical outcome is shown in Fig. 7, for both the fi = —2 
and fi — —2.5 input catalogues. As in the earlier figures, ERS 
"sources" are indicated in red, HUDF "sources" in blue, and RO- 
BUST and UNCLEAR high-redshift galaxies are indicated by solid 
and open symbols respectively. 

While comparison with the earlier figures is complicated by 
the fact that Figs 1-3 include galaxies over the redshift range 
2 = 5 — 8, and inevitably contain some genuine low-redshift inter- 



© 0000 RAS, MNRAS 000, 000-000 



8 7.5. Dunlop, et al. 



lopers (at least among the open symbols), it can be seen that Fig. 7 
reproduces the key features displayed by the real data in, for exam- 
ple, Fig. 3. Specifically, even for input /3 = —2, both the ERS and 
HUDF simulated samples yield galaxies with apparent values of P 
as blue as j3 ~ — 5 in the faintest luminosity/magnitude bin probed 
by each sample. In addition, several of these apparently ultra-blue 
sources are classified as ROBUST. 

By contrast, while artificially red sources up to /3 ~ are pro- 
duced by the photometric uncertainties, ultra-red sources are much 
less prevalent, and red ROBUST sources are very rare (only one 
ROBUST source in this simulation is retrieved with P > —1). 

The effect of these distributions of retrieved f3 values on the 
average deduced value of (f3) as a function of UV luminosity is 
shown in Fig. 8, again for both the /3 = — 2 and j3 = —2.5 input 
scenarios. 

The upper panel of Fig. 8 is remarkably similar to the z ~ 7 
points plotted in Fig. 1 of Bouwens et al. (2010b), and to those 
given in Fig. 6 of Finkelstein et al. (2010). Here the analysis of our 
P — — 2 simulation has resulted in an entirely artificial, apparently 
monotonic luminosity dependence of (/?), with (/?) approaching 
—3 in the faintest luminosity bin. Only in the brightest bin has the 
true input value of /3 been successfully retrieved. 

It is important to stress that the fact we recover (/3) ~ —2.4 
at Muv — —19.5 does not contradict the value of (/?) ~ —2.12 
we measured from the z — 7 data in this bin, as given in Fig. 6 and 
Table 1. As already discussed, to try to minimize bias, these mea- 
surements were limited to objects with at least one > 8-a detection 
in the near-infrared, and even the Muv — —19.5 luminosity bin 
contains some less significant detections which can bias the result 
to the blue unless filtered out. Thus, our simulation simply implies 
that, with the depth of WFC3/IR data analysed here, unless such 
quality control is applied, a true f3 ~ — 2 will result in an accu- 
rately measured (/}) = —2 at Muv — —20.5, a somewhat biased 
measurement of (/3) ~ —2.4 at Muv — —19.5, and a severely 
biased measurement of (/?) ~ —3 at Muv — —18.5. Thus, our 
simulation suggests that the apparent luminosity dependence of P 
with Muv reported by both Bouwens et al. (2010b) and Finkelstein 
et al. (2010) (from the same depth of data) is at least not inconsis- 
tent with a true value of ft ~ —2, independent of luminosity. 

The lower panel in Fig. 8 simply shows how even bluer values, 
with apparent {/?) < —3, inevitably result when the input value of 
P is —2.5. However, this is clearly inconsistent with the data, as the 
input value of /3 = —2.5 is of course correctly recovered from the 
simulation in the brightest luminosity bin, and this is inconsistent 
with the observed value of /3 = —2 at Muv = —20.5. 

Interestingly, the retrieved value of (ft) in the faintest lumi- 
nosity bin is not the full 0.5 lower in the lower panel of Fig. 8 as 
compared to the upper panel. This implies that one cannot easily 
correct for the bias in a unique way, and that a measured value of 
(P) ~ —3 in this luminosity bin could be consistent with a true 
P = —2 or P — —2.5 within the errors. This simply reinforces the 
need to improve the depth of the WFC3/IR data to enable higher 
signaknoise measurements of P in this crucial faint luminosity bin 
at z ~ 7. 



5.3 The origin of bias" 

To explore the origin of the "/?" bias so clearly displayed by the 
analysis of our simulations, we take advantage of the fact that the 
"true" input UV luminosity of every simulated galaxy is known, 
and explore how derived P relates to the level of "flux-boosting" ex- 
perienced by the simulated sources. This is shown in Fig. 9, where 
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Figure 7. Plots of f3 versus UV absolute magnitude, M1500 as extracted 
from the ERS (red) and HUDF (blue) WFC3/IR images for the z = 7 
simulated source population described in Section 5. The sources plotted in 
the upper panel were all input with /3 = —2, while the sources in the lower 
panel all had /3 = —2.5. The simulated source population inserted into 
the real images extended down to an input J125 magnitude of 30 (AB), 
following the form of the z ~ 7 luminosity function derived by McLure 
et al. (2010). Sources were then extracted and analysed in exactly the same 
way as the real sources; as in Figs 1 and 2, open circles denote UNCLEAR 
sources which have acceptable high-redshift solutions, but where the low- 
redshift alternative cannot be formally excluded, while the solid symbols 
denote ROBUST sources in which the alternative lower-redshift solution 
can be excluded at ~ 2-<r. 



the extracted P value for all of the reclaimed ERS and HUDF high- 
redshift P = — 2 simulated galaxies is plotted against UV luminos- 
ity (= J-band) flux boost, in magnitudes (here, a positive value of 
"Boost" means the reclaimed J125 magnitude is brighter than the 
input value by the plotted magnitude difference). 

Both the ERS and HUDF simulated galaxies behave in the 
same way and show that the extremely blue values of P almost all 
result from sources which have entered the sample because their 
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Figure 8. Plots of average {($) versus UV absolute magnitude, Mxsoo, de- 
duced from the individual /3 values shown in Fig. 9 as reclaimed for the 
two alternative simulated z ~ 7 galaxy populations. The upper plot shows 
how, without careful quality control, a galaxy population with a true value 
of j3 = —2, independent of Mjjy yields an apparent luminosity dependent 
average value of (/3) which tends to (/3) ~ —3 in the faintest luminos- 
ity bin from which sources can be selected in the current WFC3/IR HUDF 
data. The similarity between this plot and that presented by Bouwens et al. 
(2010b) (their Fig. 1) and by Finkelstein et al. (2010) (their Fig. 6) is strik- 
ing. The lower plot shows the same information for the simulated = —2.5 
population. In both plots the con'ect value of (0) is only reclaimed in the 
brightest bin, and (0) becomes progressively more biased to the blue with 
decreasing luminosity. 



"true" J125 magnitudes have been boosted by a few tenths of a 
magnitude (up to ~ 0.5 magnitudes). This is not really surpris- 
ing - at the faintest limit a substantial fraction of the supposedly 
5-<j sources in the J125 image are significantly flux boosted, and 
random noise dictates that these same sources are unlikely to be 
as extremely flux-boosted at Hieo- Equation (1) indicates that a 
J125 flux boost of 0.5 mag, will lead /3 = —2 to be distorted to 
P — —4.2, explaining naturally the behaviour displayed in Fig. 9. 
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Figure 9. A plot of derived for the simulated z ~ 7 = — 2 sources ex- 
tracted from the ERS (red) and HUDF (blue) WFC3/IR images, versus the 
UV luminosity (= J-band) flux boost, where "Boost" expresses in magni- 
tudes how much brighter the recovered J-band flux of a given source is as 
compared to its input flux. Sources with extreme apparent values of are 
largely the result of sources in which the true J-band flux has been boosted 
by a few tenths of a magnitude due to noise in the image (e.g. a 3-<r source 
has been boosted to 5-cr). Few sources in which the H-band flux has been 
boosted by comparable amounts relative to the J-band survive the redshift 
selection process, and those that do are generally classified as UNCLEAR 
(indicated here, as in previous plots, by open circles). 



However, what is less clear is why comparable flux-boosting 
in -H160 has not produced a comparable population of artificially 
red objects extending to j3 > 0. Unless the sample is specifi- 
cally J-band selected there is no obvious reason why J125 flux 
boosting should be more prevalent than //ieo flux boosting. The 
answer to this is rather subtle, /f-band flux boosting does occur, 
and it can be seen that some sources do indeed have their j3 values 
over-estimated, up to values approaching /3 = 0. However, when 
large/red values of j3 are produced by the noise and flux boosting, 
a significant fraction of the sources start to be classified as low- 
redshift sources by our code, and hence do not appear in Fig. 9 
(which only contains objects with acceptable reclaimed solutions 
at z p hot > 6). Even among those "red" sources that do survive, 
with apparent f3 ~ —1, it can be seen that few are classified as ro- 
bust, simply because the measured redder J — H colour permits an 
acceptable low-redshift solution. 

In summary, as the scatter inevitably rises in the final lumi- 
nosity bin, and is exacerbated by the effects of flux-boosting of 
fainter sources into the (apparently) flux-limited sample, the pro- 
cess of high-redshift galaxy selection can clip the red wing of the 
scattered values, and bias the average (/?) value to the blue. This 
effect is not really specific to our particular method of source selec- 
tion - a source scattered by ffi6u flux boosting to /3 > will have 
an apparent colour J — H > 0.5, and is therefore less likely to be 
regarded as a secure high-redshift source as selected by standard 
Lyman-break colour selection techniques. By contrast, any artifi- 
cially "blue" galaxy resulting from J125 flux boosting will almost 
always be retained, and indeed is liable to be classified as ROBUST. 
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We can of course check the extent to which this has happened 
in our simulated galaxy samples. Indeed, for the HUDF simulation 
shown in Figs 7-9, we find that from an input sample of 82 z = 7 
galaxies, 12 were scattered out of the sample to low redshift due, in 
most cases, to their redder J — H colours. 

Finally, for completeness, we show in Fig. 10 the effect of re- 
stricting our analysis of the z = 7, j3 — — 2 simulation to sources 
with at least one > 8-cr detection with WFC3/IR, in effect replicat- 
ing our final analysis of the real data as presented in Fig. 6. As with 
the real data, applying this level of quality control leaves us unable 
to say anything about f3 at Muv — —18.5, but interestingly (and 
reassuringly) it also reduces the level of bias in the Muv — —19.5 
luminosity bin to 8(3 ~ 0.2. Clearly the results presented in Fig. 6 
are completely consistent with fj = — 2. 



6 DISCUSSION 

Our key results can be summarized as follows. 

First we find that, at z — 5 and z = 6, the average value 
of UV slope is perfectly consistent with j3 = — 2 and displays no 
significant luminosity dependence over the UV luminosity range 
—22 < M1500 < —18. Second, we find that the same result ap- 
pears to hold at z ~ 7, over the more restricted luminosity range 
—21 < Mi goo < —19, but conclude that no robust statement can 
yet be made about (f3) at fainter luminosities at z > 6.5. Third, we 
show, both via data consistency arguments from fields of varying 
depth, and from simple (yet realistic, and end-to-end) simulations 
that attempting to extend the measurement of average UV slope 
into the faintest available luminosity bin (as determined by ~ 4 — 5- 
a detections) yields values of {(3) which are biased to the blue, and 
can yield apparent average values as low as (/3) ~ —3, even for a 
true input value of f3 = —2 for every source. 

Thus, while we cannot rule out the recent claims that the 
faintest galaxies yet discovered at z ~ 7 have extremely blue 
slopes, (f3) ~ —3, we do show that such extreme values of (/3) 
are not found (from the current data) in any luminosity or redshift 
bin where good-quality photometry is available (where "good" here 
means at least one detection in a WFC3/IR band at a significance 
level better than 8-cr). We note here that Finkelstein et al. (2010), 
while reporting raw results on (/3) at z ~ 7 (their Fig. 6) very 
similar to those reported by Bouwens et al. (2010b), derive larger 
errors on (/?), and conclude that there is as yet no evidence for a 
dependence of (f3) on Muv at z ~ 7. 

This, then, provides a very strong and clear motivation for 
still deeper WFC3/IR imaging in the HUDF, given the importance 
of testing the astrophysically important possibility that the very 
faintest high-redshift galaxies do display UV slopes significantly 
bluer that (/?) = —2.5, with all the associated implications for 
metallicity, age and ionizing photon escape fraction (Bouwens et 
al. 2010b; Robertson et al. 2010). A depth improvement of ~ 0.5 
mag. would be sufficient to convert most of the current ~ 5-a de- 
tections to ~ 8-cr detections, thus enabling proper exploration of 
(/3) down to Muvab ~ 18 at z ~ 7. 

Because of its novelty, and potentially crucial implications for 
reionization, we have focussed most of the above discussion, and 
indeed our simulations, on the measurement of /3 at z ~ 7. How- 
ever, it is also of interest to assess how our results at z ~ 6 and 
z ~ 5 measure up to previous studies of (3 at these and also lower 
redshifts. 

The most obvious point of comparison is the major study of 
UV continuum slope over the redshift range 2 < z < 6 carried out 
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Figure 10. Reclaimed average value of (/?) as a function of UV luminosity 
from the = —2, z ~ 7 simulation, when attention is restricted to sources 
with > 8-cr detections in J125 and/or Hiao. Only a very slight bias to 
bluer values of /? remains, but as with the real galaxy sample at z ~ 7, 
this restriction to decent quality photometry (relatively unaffected by flux 
boosting) means that with the current data we can say nothing meaningful 
about the sources with Muv — —18.5. To do this requires the current 
HUDF imaging to be deepened by a further ~ 0.5 mag. in the relevant 
WFC3/IR wavelength regime. 



by Bouwens et al. (2009). This work presented extremely good ev- 
idence for a luminosity dependence of f3 at 2 ~ 2.5 and z ~ 4. We 
wish to stress that our failure to find any such luminosity depen- 
dence in (3 at higher redshifts should not be taken as casting doubt 
on these results at lower redshift. In particular, the evidence pre- 
sented by Bouwens et al. (2009) for a steady decrease in (f3) over 
the luminosity range — 22 < Muv < —17 appears compelling, 
based as it is on very large samples of Lyman-break galaxies in 
which f3 can be determined purely from the optical photometry (i.e. 
from HST ACS 1775 — zsso colour). However, this luminosity de- 
pendence takes the average value of UV slope from (f3) ~ —1.3 
at Muv = -22.2 to (/3) ~ -1.9 at M UV = -17.2, and is at- 
tributed by Bouwens et al. (2009) as being primarily a result of 
decreasing dust obscuration with decreasing UV luminosity (see 
also Stark et al. 2010). Crucially, even at the faintest luminosities 
probed at z ~ 4, the bluest value of (f3) reported by Bouwens et al. 
(2009) is (0) = -2.03 ± 0.04 ± 0.15. 

It is by no means obvious that a decrease in dust content with 
increasing redshift should maintain the slope of the {3 — Muv rela- 
tion, simply shifting it to more negative values of {3. As already dis- 
cussed here and elsewhere, it is relatively straightforward for 'nor- 
mal', essentially dust-free stellar populations to produce (3 = — 2, 
but the production of significantly bluer slopes requires different 
astrophysics in the form of very young, very low metallicity stellar 
populations, with low levels of nebular emission. 

The results presented by Bouwens et al. (2009) on j3 at z ~ 5 
and 2 ~ 6 are inevitably much more uncertain than those at lower 
redshift, in part because they involved the use of NICMOS data in 
the measurement of /3. Nevertheless, at the brighter end of the lumi- 
nosity range probed, at Muv — —20.5, our more robust measure- 
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ments support the conclusion of Bouwens et al. (2009) that Lyman 
break galaxies are bluer at 2 ~ 5 — 6 than at z ~ 4, with {/3} having 
moved from (/3) = -1.5 ±0.15 at « ~ 4 to (0) = -2 ±0.20 (see 
our Fig. 6 and the values given in Table 4 of Bouwens et al. 2009). 

There is thus no serious doubt that the brighter Lyman break 
galaxies have become significantly bluer with increasing redshift, 
and the idea that this change is primarily due to decreasing dust 
content gains support from the very low (generally negligible) val- 
ues of Av inferred from the best-fitting SED models at 2 > 6.5 
deduced by McLure et al. (2011). Therefore the key question now 
is whether the /3 — Muv relation essentially plateaus at /3 ~ —2 
at 2 > 5 due to the near absence of dust at all luminosities, or 
whether there is indeed evidence for a continuing dependence of 
j3 on Muv, albeit perhaps with a different slope. Our own results, 
as shown in Fig. 6, support the former scenario, but as already dis- 
cussed, our analysis also emphasizes the vital importance of deeper 
WFC3/IR data to establish the true values of the typical UV slopes 
of the very faintest galaxies at z ~ 7. 
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7 CONCLUSIONS 

We have undertaken a critical study of the evidence for extremely 
blue UV continuum slopes in the highest redshift galaxies, fo- 
cussing on the robust determination of the UV power-law index 
P (where f\ oc A^). Our analysis is based on three new WFC3/IR- 
selected samples of galaxies spanning nearly two decades in UV 
luminosity over the redshift range z ~ 4.5 — 8 (McLure et al. 
201 1). We have explored the impact of inclusion/exclusion of less 
robust high-redshift candidates, and have used the varying depths 
of the samples to explore the effects of noise and selection bias at 
a given UV luminosity. Simple data-consistency arguments indi- 
cate that artificially blue average values of j3 can result when the 
analysis is extended into the deepest ~ 0.5-magnitude bin of these 
WFC3/IR-selected galaxy samples, regardless of the actual lumi- 
nosity or redshift range probed. By confining attention to robust 
high-redshift galaxy candidates, with at least one 8-a detection in 
the WFC3/IR imaging, we find that the average value of /3 is consis- 
tent with (/3) = -2.05±0.10 over the redshift range 2 = 5-7, and 
the UV absolute magnitude range —22 < Muv,ab < —18, and 
that (/?) shows no significant trend with either redshift or Muv- 

We have created and analysed a set of simple end-to-end sim- 
ulations based on the WFC3/IR+ACS HUDF and ERS datasets 
which demonstrate that a bias towards artifically low/blue average 
values of /3 is indeed "expected" when the UV slope analysis is 
extended towards the source detection threshold, and conclude that 
there is as yet no clear evidence for UV slopes significantly bluer 
than j3 ~ —2, the typical value displayed by the bluest star-forming 
galaxies at more modest redshifts (e.g. NGC1705; f3 = —2.15). 
A robust measurement of (j3) for the faintest galaxies at 2 ~ 7 
(and indeed z ~ 8) remains a key observational goal, as it pro- 
vides a fundamental test for high escape fractions from a potentially 
abundant source of reionizing photons. This goal is achievable with 
HST, but requires still deeper WFC3/IR imaging in the HUDF. We 
note, however, that, due to degeneracies between escape fraction 
and metallicity, it may prove hard to establish robust evidence for 
a high escape fraction from the measurement of (3 unless extreme 
values of j3 = — 3 are indeed confirmed for faint galaxies at z > 7 
(in which case both low metallicity and a high escape fraction are 
required). 
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